Layered Speech-Act Annotation for Spoken Dialogue Corpus
نویسندگان
چکیده
This paper describes the design of speech act tags for spoken dialogue corpora and its evaluation. Compared with the tags used for conventional corpus annotation, the proposed speech intention tag is specialized enough to determine system operations. However, detailed information description increases tag types. This causes an ambiguous tag selection. Therefore, we have designed an organization of tags, with focusing attention on layered tagging and context-dependent tagging. Over 35,000 utterance units in the CIAIR corpus have been tagged by hand. To evaluate the reliability of the intention tag, a tagging experiment was conducted. The reliability of tagging is evaluated by comparing the tagging among some annotators using kappa value. As a result, we confirmed that reliable data could be built. This corpus with speech intention tag could be widely used from basic research to applications of spoken dialogue. In particular, this would play an important role from the viewpoint of practical use of spoken dialogue corpora.
منابع مشابه
Dialogue Acts Annotation for NICT Kyoto Tour Dialogue Corpus to Construct Statistical Dialogue Systems
This paper introduces a new corpus of consulting dialogues designed for training a dialogue manager that can handle consulting dialogues through spontaneous interactions from the tagged dialogue corpus. We have collected more than 150 hours of consulting dialogues in the tourist guidance domain. This paper outlines our taxonomy of dialogue act (DA) annotation that can describe two aspects of an...
متن کاملConstruction of Structurally Annotated Spoken Dialogue Corpus
This paper describes the structural annotation of a spoken dialogue corpus. By statistically dealing with the corpus, the automatic acquisition of dialoguestructural rules is achieved. The dialogue structure is expressed as a binary tree and 789 dialogues consisting of 8150 utterances in the CIAIR speech corpus are annotated. To evaluate the scalability of the corpus for creating dialogue-struc...
متن کاملCzech-Sign Speech Corpus for Semantic Based Machine Translation
This paper describes progress in a development of the human-human dialogue corpus for machine translation of spoken language. We have chosen a semantically annotated corpus of phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler plans. Corpus dialogue act tags incorporate abstract semantic meaning. We have enriched a part of th...
متن کاملA human-human train timetable dialogue corpus
This paper describes progress in a development of the humanhuman dialogue corpus. The corpus contains transcribed user’s phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler’s plans. The corpus is based on dialogues’s transcription of user’s inquiries that were previously collected for a train timetable information center. We e...
متن کاملInfluence of contextual information in emotion annotation for spoken dialogue systems
In this paper, we study the impact of considering context information for the annotation of emotions. Concretely, we propose the inclusion of the history of user–system interaction and the neutral speaking style of users. A new method to automatically include both sources of information has been developed making use of novel techniques for acoustic normalization and dialogue context annotation....
متن کامل